

INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN ELECTRICAL, ELECTRONICS, INSTRUMENTATION AND CONTROL ENGINEERING ol. 4. Issue 6. June 2016

# High Performance FIR Filter Design using Efficient Carry select Adder and Booth Multiplier

Amal George<sup>1</sup>, B.Sakthivel<sup>2</sup>

PG Scholar, M.E VLSI Design, CIET, Coimbatore, India<sup>1</sup>

Asst Professor, ECE Department, CIET, Coimbatore, India<sup>2</sup>

Abstract: Finite Impulse Response filters square measure the foremost vital component in signal process and communication. FIR filter design consists of multipliers, adder and delay unit. FIR filter performance is principally supported adder and multiplier unit. Here we tend to square measure victimization space delay power economical carry choose adder and changed booth number. Within the carry select adder (CSLA) we've got eliminated all the redundant logic operations gift within the typical CSLA and projected a brand new logic formulation for CSLA. An economical CSLA style is obtained victimization optimized logic units. The projected CSLA style involves considerably less space and delay than the recently projected BEC-based CSLA. Attributable to the tiny carry-output delay, the projected CSLA style could be a smart candidate for square-root (SQRT) CSLA. Changed Booth is double as quick as Booth rule. It produces solely 0.5 the amount of partial product (PPs) when put next with a standard binary multiplication. Changed Booth cryptography (MBE) theme is known because the best Booth cryptography and secret writing theme.

Keywords: Carry select adder, square root carry select adder, binary to excess 1 converter, partial products, finite impulse response.

## **I. INTRODUCTION**

Low power area efficient high performance VLSI systems The CSLA has two units: 1) the sum and carry generation are now increasingly used in various fields. Digital signal unit (SCG) and 2) the sum and carry selection unit. The processing technology and its advancements have SCG unit consumes most of the logic resources of CSLA dramatically impacted our modern society everywhere. Kim and Kim [4] used one RCA and one add-one circuit Without DSP we would not have digital audio and speech, digital telephone, automobile industry, medical imaging equipment, multimedia application. Finite impulse response (FIR) filtering is one of the most widely used operations in DSP Digital signal processing in radar applications are usually done using FIR filters. Basically there are two types of filters-analog and digital. Digital filters have the potential to attain much better signal to noise ratio than analog filters. FIR and IIR filters are the two common forms of filter forms. Drawback of IIR filters is that the closed-form IIR design are preliminary limited to low pass, band pass and high pass filters.

FIR filters can have precise linear phase. This paper describes an approach to implementation of high performance FIR filter using area-delay-power efficient carry select adder. An adder is a main component of an arithmetic unit. A complex digital signal processing (DSP) system involves several adders. An efficient adder improves the performance of a complex DSP system. There are several types of adders are available. A ripple carry adder uses simple design but the carry propagation delay (CPD) is main concern in this adder. Carry look ahead and carry select (CS) methods have been used to reduce CPD of adders.

A conventional carry select adder (CSLA) is an RCA-RCA configuration that generate a pair of sum words and output carry bits corresponding anticipated input carry  $(c_{in}=0 \text{ and } 1)$  and selects each pair of final sum and final redundant logic operations. Based on this analysis, we output carry.it independently generating multiple carries have proposed a logic formulation for the CSLA. The and then select a carry to generate a sum.

instead of two RCAs, where the add one circuit is implemented using a multiplexer (MUX). He et al. [5] proposed a square-root (SQRT)-CSLA to implement large bit-width adders with less delay. In a SQRT CSLA, CSLAs with increasing size are connected in a cascading structure. The main objective of SQRT-CSLA design is to provide a parallel path for carry propagation that helps to reduce the overall adder delay. Ramkumar and Kittur [6] suggested a binary to BEC-based CSLA. The BEC-based CSLA involves less logic resources than the conventional CSLA, but it has marginally higher delay. A CSLA based on common Boolean logic (CBL) is also proposed in [7] and [8]. The CBL-based CSLA of [7] involves significantly less logic resource than the conventional CSLA but it has longer CPD, which is almost equal to that of the RCA. To overcome this problem, a SQRT-CSLA based on CBL was proposed in [8].

However, the CBL-based SQRTCSLA design of [8] requires more logic resource and delay than the BECbased SQRT-CSLA of [6]. We observe that logic optimization largely depends on availability of redundant operations in the formulation, whereas adder delay mainly depends on data dependence. In the existing designs, logic is optimized without giving any consideration to the data dependence. In this brief, we made an analysis on logic operations involved in conventional and BEC-based CSLAs to study the data dependence and to identify main contribution in this paper is logic formulation based



INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN ELECTRICAL, ELECTRONICS, INSTRUMENTATION AND CONTROL ENGINEERING Vol. 4, Issue 6, June 2016

on data dependence and optimized carry generator (CG) and CS design.

Multipliers consume the most amount of area in a FIR filter design. Product of two numbers has twice the original bit width of the multiplied numbers. We can truncate the product bits to the required precision to reduce the area cost [1]-[2]. Conventional multipliers are replaced by a modified Booth multiplier here. Modified Booth is twice as fast as Booth algorithm. It produces only half the number of partial products (PPs) when compared with an ordinary binary multiplication. Modified Booth encoding (MBE) scheme is identified as the most efficient Booth encoding and decoding scheme. The truncation error for a modified Booth multiplication is not more than 1 ULP (unit of last place or unit of least precision). So there is no need of error compensation circuits.

#### **II. SYSTEM ANALYSIS**

1. LOGIC EXPRESSION FOR FIR FILTER DESIGN The following convolution shows the performance of Ntap FIR filters:

$$Y_{j} = \sum_{K=0}^{N-1} C_{K} X_{J-K}$$
(1)

Where

 $C_K$  is the coefficient of the filter,  $X_j$  and  $Y_j$  are the jth terms of the input and output sequences, respectively. The Z-transform of Equation (1) is given below:

$$Y(z) = H(z) X(z)$$
 (2)

Where Y (z) and X (z) are the z-transforms of the output and input filters respectively and H (z) is the transfer function.

By multiplying H (z) the frequency response is not changed.

For example the z-transfer of first order low pass FIR filter is given by ( $\infty = -1$ ,  $\beta = 1$ , m = 1) the equation is as follows:

$$Y (Z) = \frac{[1-Z^{-1}]H(Z)}{[1-Z^{-1}]} X (Z)$$
(3)  
$$Y(Z)Y(Z)Z^{-1} = [H(Z)X(Z)-H(Z)X(Z)Z^{-1}]$$
(4)

According to equation (1) and (5) the transformed filter can be expressed as

$$Y_{j}-Y_{j-1} = \sum_{K=0}^{N-1} C_{K} X_{j-K} \sum_{K=0}^{N-1} C_{K} X_{j-K-1}$$
(5)

For the first order differential coefficients, differences between adjacent coefficients are used to obtain the filter outputs are clearly shown in equation (5) and the previous filter output. In order to realize the transformed filter requires one additional multiplication and subtraction operation.

## 2. LOGIC FORMULATION FOR ADDER DESIGN

The carry select adder (CSLA) has two units: 1) the sum and carry generator unit (SCG) and 2) the sum and carry selection unit. The SCG unit consumes most of the logic resources of CSLA and significantly contribute to the consists of one HSG unit, one FSG unit, one CG unit and critical path. Different logic designs have been suggested for efficient implementation of the SCG unit.



As shown in figure the SCG unit of the conventional CSLA is composed of two n-bit RCAs, where n is the adder bit width. Logic operation of n-bit RCA taken in four stages: 1) Half sum generation (HSG); 2) Half carry generation (HCG); Full sum generation (FSG); 4) Full carry generation (FCG). Suppose two n-bit operands are added in the conventional CSLA, then RCA-1 and RCA-2 of the SCG unit of the n-bit CSLA



Figure 1(b): Logical operation of RCA

$$S_0^0(i) = A(i) \oplus B(i)$$
  $C_0^0(i) = A(i).B(i)$  (6)

$$S_1^0(i) = S_0^0(i) \oplus C_1^0(i-1)$$
(7)

$$C_1^0(i) = C_0^0(i) + S_0^0(i) \cdot C_1^0(i-1)$$
(8)

$$C_{0tt}^{0} = C_{1}^{0}(n-1)$$

$$S_{0}^{1}(i) = A(i) \oplus B(i)$$

$$C_{0}^{1}(i) = A(i) . B(i)$$

$$S_{1}^{1} = S_{0}^{1}(i) \oplus C_{1}^{1}(i-1)$$
(9)

$$C_1^1(i) = C_0^1(i) + S_0^1(i) \cdot C_1^1(i-1)$$
(10)

$$C_{out}^1 = C_1^1(n-1)$$
(11)

These redundant logic operations can be removed to have an optimized logic design for RCA-2, in which the HSG and HCG unit is shared to construct RCA2.

## **3. PROPOSED ADDER DESIGN**

The proposed CSLA structure is given in the figure 3. It one CS unit. The CG unit composed of two CGs (CG0 and CG1) corresponding to the input carry '0' and '1'. The

#### DOI 10.17148/IJIREEICE.2016.4609



INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN ELECTRICAL, ELECTRONICS, INSTRUMENTATION AND CONTROL ENGINEERING Vol. 4, Issue 6, June 2016

HSG receives two n-bit operands (A and B) and generate half sum word  $s_0$  and half carry word  $c_0$  of width n-bits each. Both CG0 and CG1 receive  $s_0c_0$  from HSG unit and generate two n-bit full carry words  $C_1^0$  and  $C_1^1$ corresponding to the input carry '0' and '1', respectively. The CS unit selects one final carry word from the two carry words available at the input line using the control signalc<sub>in</sub>. It selects c when  $c_{in}=0$ ; otherwise, it select $c_1^1$ . The CS unit can be implemented using an n-bit 2-to-1 MUX. However, we find from the truth table of the CS unit that carry words  $c_{01}$  and  $c_{11}$  follow a specific bit pattern.



Figure 2: Proposed Carry select adder (CSLA) design, where 'n' is the input operand bit width

(i) = '1', then  $c_{11}$ 

(i) = 1, irrespective of S0(i) and C0(i), for  $0 \le i \le n - 1$ . This feature is used for logic optimization of the CS unit. The optimized design of the CS unit is composed of n AND–OR gates. The final carry word c is obtained from the CS unit. The MSB of c is sent to output as  $c_{out}$ , and (n - 1) LSBs are XOR with (n- 1) MSBs of half sum (s0) in the FSG to obtain (n - 1) MSBs of final-sum (s). The LSB of s0 is XOR with  $C_{in}$  to obtain the LSB of S.

#### a) HSG unit

The logic diagram of HSG unit is shown in the figure 4. It consists of combinational logic AND gate and OR gate.



#### b) CG0 & CG1 Unit

The logic circuit of CG0 and CG1 are optimized to take advantage of the fixed carry bit. The optimized designs of CG0 and CG1 are shown in Figures. It consist of logical or and logical and gates.





## c) CS Unit

The CS Unit is referred as carry select unit and it consist of combination of logical OR and Logical AND gates.



### d) FSG Unit

The FSG Unit Referred as final-sum generation unit. It consists of Combination of XOR gates. It performs Sum operation.



## 4. MULTIPLIER DESIGN

Multiplication consists of three steps: 1) the first step to generate the partial products; 2) the second step to add the generated partial products until the last two rows are remained; 3) the third step to compute the final multiplication results by adding the last two rows.

#### **Booth multiplier**

Booth algorithm is a powerful algorithm for signednumber multiplication, which treats both positive and negative numbers uniformly. The delay of multiplier is determined mainly by the number of additions to be performed. The booth multiplier makes use of booth encoding algorithm in order to reduce the number of partial products by considering two bits at a time, thereby achieving a speed advantage over other multiplier architectures.

The key to Booth's insight is in his classifying groups of bits into the beginning, the middle, or the end of a run of 1s. It starts with an assumed bit 0 bit to the right of the rightmost bit for the first stage. Since we are dealing with signed numbers, the one other requirement is that shifting the product right must preserve the sign of the



INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN ELECTRICAL, ELECTRONICS, INSTRUMENTATION AND CONTROL ENGINEERING Vol. 4. Issue 6. June 2016

intermediate result. The solution is to extend the sign Initially, a '0' is placed to the right most bit of the when the product is shifted to the right. This shift is called multiplier. an arithmetic right shift. Arithmetic right shift is an Then 3 bits of the multiplicand is recoded according to the arithmetic shift to right by 0 to 32 places. The vacated bits following equation, at the most significant end of the word due to the shifting are filled with zeros if the original value (the source operand) was positive and the vacated bits are filled with ones if the original value was negative.

#### 5. MODIFIED BOOTH MULTIPLIER

Booth algorithm is a method that will reduce the number of multiplicand multiples, i.e. the booth encoding algorithm reduces the number of rows of partial products generated in a multiplication. For a given range of numbers to be represented, a higher representation in radix leads to fewer digits. Since a k-bit binary number can be interpreted as K/2-digit radix-4 number, a K/3-digit radix-8 number, and so on, it can deal with more than one bit of the multiplier in each cycle by using high radix multiplication. If multiplication is done in radix 4, in each step, the partial product term  $(B_{i+1}B_i)$  2A needs to be formed and added to the cumulative partial product. Figure shift left one bit the two's complement of the multiplicand 1.3 shows the radix-4 multiplication. The basic block value and multiply by '2' implies just shift left the diagram of the multiplier is shown in figure 8.

$$Z_i = -2x_{i+1} + x_i + x_{i-1}$$
(12)

The two's complement representations of X and Y can be expressed as given below,

$$X = -x_{n-1}2^{i} + \sum_{\substack{i=0\\n-2}}^{n-2} x_{i}2^{i}$$
(13)  
$$Y = -y_{n-1}2^{i} + \sum_{\substack{i=0\\i=0}}^{n-2} y_{i}2^{i}$$
(14)

For product generator, multiply by zero means the multiplicand is multiplied by '0'. Multiply by '1' means the product still remains the same as the multiplicand value. Multiply by '-1' means that the product is the two's complement form of the number. Multiply by '-2' is to multiplicand by one place.



Figure 7: Block diagram of modified booth multiplier

#### **III. PERFORMANCE ANALYSIS**

existing CSLA including the conventional one for input We can find that the proposed CSLA involves nearly 29% bit-widths 8 and 16.

We have considered all the gates to be made of 2-input For the single-stage CSLA, the input-carry delay is and, 2-input or, and inverter (AOI). We have estimated the assumed to be t = 0 and the delay of final-sum (fs) area and delay complexities of the proposed CSLA and the represents the adder delay. The estimated values are listed. less area and 5% less output delay. Compared with the



INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN ELECTRICAL, ELECTRONICS, INSTRUMENTATION AND CONTROL ENGINEERING Vol. 4. Issue 6. June 2016

CBL-based CSLA, the proposed CSLA design has Multiplier with proposed area delay power efficient CSLA marginally less ADP. However, in the CBL-based CSLA, has better area reduction and low power consumption. In delay increases at a much higher rate than the proposed the fixed point calculation of FIR (Finite Impulse CSLA design for higher bit widths. Compared with the conventional CSLA, the proposed CSLA involves 0.42 ns outputs is commonly used i.e., When the bit widths of more delay, but it involves nearly 28% less ADP due to less area complexity. Interestingly, the proposed CSLA design offers multipath parallel carry propagation, whereas the CBL-based CSLA offers a single carry propagation path identical to the RCA design. Moreover, the proposed CSLA design has 0.45 ns less output-carry delay than the output-sum delay. This is mainly due to the CS unit that produces output-carry before the FSG calculates the finalsum.

The modified Booth Multiplier has been incorporated in Digital FIR filter. The FIR filter using the modified booth

Response) filter, full operand bit-width of the multiplier coefficients and data inputs are 8, the multiplier produces 16-bit output. Similarly all other taps are processed and added using 16-bit adder. In this study, a 3-tap Direct Form FIR filter has been designed using modified booth Multiplier and Square Root Carry Select Adder. The proposed FIR filter is compared with the conventional FIR filter using regular booth multiplier and Square Root Carry Select Adder. From the comparison, it can be seen that the proposed FIR filter consumes less area and power than conventional FIR filter.



FIR filter design using regular CSLA and booth multiplier

FIR filter design using efficient CSLA modified booth multiplier

| Fig 8: Performance comparison of FIR filters using regular CSLA with booth multiplier and efficient CSLA with |
|---------------------------------------------------------------------------------------------------------------|
| modified booth multiplier.                                                                                    |

| METHOD                                                               | DELAY(ns) | LUT | POWER (mw) |
|----------------------------------------------------------------------|-----------|-----|------------|
| FIR filter design using regular CSLA and booth multiplier            | 10.825    | 68  | 270        |
| FIR filter design using efficient CSLA and modified booth multiplier | 10.881    | 66  | 251        |

## Fig 9: Performance comparison of FIR filter using regular CSLA with booth multiplier and efficient CSLA with modified booth multiplier.



Fig 10: Block diagram of direct form of fir filter



## IJIREEICE

INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN ELECTRICAL. ELECTRONICS. INSTRUMENTATION AND CONTROL ENGINEERING Vol. 4, Issue 6, June 2016



Fig 11: simulation results of proposed FIR filter using efficient CSLA and modified booth multiplier

### **IV. CONCLUSION**

We have analyzed the logic operations involved in the [12] L. Aksoy, E. Costa, P. Flores, and J. Monteiro, "Exact and conventional CSLA, eliminated all the redundant logic operations of the conventional CSLA and proposed a new logic formulation for the CSLA. In the proposed scheme, the CS operation is scheduled before the calculation of [13] V. G. Oklobdzija, "High-Speed VLSI Arithmetic Units: Adders and final-sum. The proposed CSLA design involves significantly less area and delay than the conventional CSLA. Due to the small carry output delay, the proposed CSLA design is a good candidate for the SQRT adder. The booth multiplier makes use of booth encoding algorithm in order to reduce the number of partial products by considering two bits at a time, thereby achieving a speed advantage over other multiplier architectures. The FIR filter design using efficient carry select adder and modified booth multiplier will helps to increases the performance of the filter in terms of area delay and power.

#### REFERENCES

- [1] K. K. Parhi, VLSI Digital Signal Processing. New York, NY, USA:Wilev.1998.
- [2] A. P. Chandrakasan, N. Verma, and D. C. Daly, "Ultralow-power electronics for biomedical applications," Annu. Rev. Biomed. Eng., vol. 10, pp. 247-274, Aug. 2008.
- [3] O. J. Bedrij, "Carry-select adder," IRE Trans. Electron. Comput., vol. EC-11, no. 3, pp. 340-344, Jun. 1962.
- [4] Y. Kim and L.-S. Kim, "64-bit carry-select adder with reduced area," Electron. Lett., vol. 37, no. 10, pp. 614-615, May 2001.
- Y. He, C. H. Chang, and J. Gu, "An area-efficient 64-bit square root [5] carryselect adder for low power application," in Proc. IEEE Int. Symp. Circuits Syst., 2005, vol. 4, pp. 4082-4085.
- [6] B. Ramkumar and H.M. Kittur, "Low-power and area-efficient carry-select adder," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 2, pp. 371-375, Feb. 2012.
- [7] I.-C. Wey, C.-C. Ho, Y.-S. Lin, and C. C. Peng, "An area-efficient carry select adder design by sharing the common Boolean logic term," in Proc. IMECS, 2012, pp. 1-4.
- [8] S.Manju and V. Sornagopal, "An efficient SQRT architecture of carry select adder design by common Boolean logic," in Proc. VLSI ICEVENT, 2013, pp. 1-5.
- [9] B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs, 2nd ed. New York, NY, USA: Oxford Univ. Press, 2010.
- [10] Kavita and Jasbir Kaur, "Design and Implementation of an Efficient Modified Booth Multiplier using VHDL", Proceedings of 2nd International Conference on Emerging Trends in Engineering and Management, ICETEM 2013
- [11] International Standard : ISO/CD17361 "Intelligent Transport system-Lane Departure warning system Performance Requirements and Test Procedure, June 23, 2003.

- approximate algorithms for the optimization of area and delay in multiple constant multiplications," IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 27, no. 6, pp. 1013-1026, Jun. 2008.
- Multipliers", in "Design of High-Performance Microprocessor Circuits", Book edited by A.Chandrakasan, IEEE Press, 2000